Memory Management Internals

Why Does a Python Process Use 800MB for 10 Million Small Integers?

This is a real scenario. You run a data pipeline that reads 10 million integer IDs from a database and holds them in a Python list. A naive estimate: 10M × 8 bytes (int64) = 80MB. Your process's RSS is 800MB. Where did the other 720MB go?

Let us trace exactly where:

import tracemalloc
import sys
import os

tracemalloc.start()

# Create 10 million Python integers
# (deliberately using values > 256 to avoid the small integer cache)
ids = list(range(10_000_000, 20_000_000))

snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics('lineno')
for stat in stats[:5]:
    print(stat)

# Also check process RSS
with open(f'/proc/{os.getpid()}/status') as f:  # Linux only
    for line in f:
        if 'VmRSS' in line:
            print(line.strip())

The breakdown:

The list itself: sys.getsizeof(ids) = 10M × 8 bytes (pointers) + 56 bytes overhead ≈ 80MB
Each integer object: Each PyLongObject for a 7-digit integer is 28 bytes. That is 10M × 28 = 280MB
Allocator overhead: pymalloc divides memory into 4KB pools. For 28-byte objects (fitting in 32-byte blocks), each pool holds 4096/32 = 128 objects. Total pools: 10M/128 = 78,125 pools = 305MB of pool space
Arena overhead: Pools are grouped into 256KB arenas. Some arena space is partially used = ~50MB
Process overhead: Python runtime, stdlib, interpreter state, other allocations = ~80MB

Total: 80 + 280 + 305 + 50 + 80 ≈ 795MB

This is not a bug. This is Python's memory model doing exactly what it is designed to do. Understanding it lets you predict memory usage and diagnose production incidents.

Memory Allocator Layers

CPython uses a layered memory allocation system. From bottom to top:

Layer 4: Python Objects
  sys.getsizeof() measures this layer
  PyLongObject, PyListObject, etc.
  Object-specific free lists (int, float, list)
         │
         ▼
Layer 3: Object-specific allocators
  Each type has a tp_alloc function
  Calls into pymalloc or directly to malloc
         │
         ▼
Layer 2: pymalloc (Objects/obmalloc.c)
  Handles allocations ≤ 512 bytes (Python 3.3+: ≤ 512B; earlier: ≤ 256B)
  Uses arenas → pools → blocks
  Avoids calling OS malloc for small objects (very fast)
         │
         ▼
Layer 1: C library malloc (glibc malloc / jemalloc / mimalloc)
  pymalloc calls malloc() to get large arena chunks
  Used directly for allocations > 512 bytes
         │
         ▼
Layer 0: Operating system (mmap / brk)
  malloc calls mmap() or brk() to get pages from the kernel
  Minimum allocation unit: 4KB page

The key insight: Python almost never calls malloc() for individual small objects. It calls malloc() to get 256KB arena chunks, then subdivides those chunks internally. This is much faster (no kernel call overhead per object) and has better cache locality.

`pymalloc`: Arenas, Pools, and Blocks

pymalloc is CPython's custom small-object allocator defined in Objects/obmalloc.c. It handles all allocations of 512 bytes or less and is tuned specifically for Python's allocation patterns.

The three-level hierarchy:

┌─────────────────────────────────────────────────────────────────────┐
│ ARENA (256KB = 262,144 bytes)                                        │
│ Allocated from OS via malloc()                                       │
│ Max 64 arenas tracked simultaneously (managed in usedpools[])       │
│                                                                      │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐       ┌────────────┐  │
│  │  POOL 1    │ │  POOL 2    │ │  POOL 3    │  ...  │  POOL 64   │  │
│  │  (4KB)     │ │  (4KB)     │ │  (4KB)     │       │  (4KB)     │  │
│  │            │ │            │ │            │       │            │  │
│  │ block size │ │ block size │ │ block size │       │ block size │  │
│  │    32B     │ │    48B     │ │    64B     │       │   varies   │  │
│  │            │ │            │ │            │       │            │  │
│  │ [32B][32B] │ │ [48B][48B] │ │ [64B][64B] │       │            │  │
│  │ [32B][32B] │ │ [48B][48B] │ │ [64B][64B] │       │            │  │
│  │ [32B] ...  │ │ [48B] ...  │ │ [64B] ...  │       │            │  │
│  └────────────┘ └────────────┘ └────────────┘       └────────────┘  │
│   4096/32=128   4096/48=85     4096/64=64                           │
│   objects/pool  objects/pool   objects/pool                         │
└─────────────────────────────────────────────────────────────────────┘

Block sizes are quantised to 8-byte multiples (size classes):

Request size	Block size allocated	Objects per 4KB pool
1–8 bytes	8 bytes	512
9–16 bytes	16 bytes	256
17–24 bytes	24 bytes	170
25–32 bytes	32 bytes	128
33–40 bytes	40 bytes	102
41–48 bytes	48 bytes	85
...8-byte steps...	...	...
505–512 bytes	512 bytes	8
> 512 bytes	(delegates to malloc)	-

Pool header (at the start of each pool):

typedef struct pool_header {
    union { block *_padding; uint count; } ref;  // Number of allocated blocks
    block *freeblock;                             // Head of free list
    struct pool_header *nextpool;                 // Next pool for this size class
    struct pool_header *prevpool;                 // Previous pool for this size class
    uint arenaindex;                              // Which arena this pool belongs to
    uint szidx;                                   // Size class index (0-63)
    uint nextoffset;                              // Offset of next available block
    uint maxnextoffset;                           // Pool capacity marker
} poolp;

When you allocate a small object:

Determine size class from allocation size (round up to 8-byte boundary)
Check usedpools[size_class] for a pool with a free block
If found: pop from free list (pool->freeblock), return to caller - no syscall
If not found: allocate a new pool from an arena (or a new arena from malloc)

When you free a small object:

Push the block back onto the pool's free list - no syscall
If pool is now empty: return pool to arena's free pool list
If arena is now entirely empty: return arena to OS via free()

This design means that allocating a small Python object typically takes fewer than 30 CPU instructions and zero syscalls.

Free Lists: Recycling Common Objects

For extremely frequently allocated types, CPython maintains free lists - pre-allocated pools of recycled objects that bypass pymalloc entirely:

Integer Free List

The small integer cache (-5 to 256) handles common integers. For integers outside that range, CPython uses a free list of recently-freed PyLongObject instances:

// Objects/longobject.c
#define _PyLong_NSMALLNEGINTS 5
#define _PyLong_NSMALLPOSINTS 257
// Small integer singletons (NOT a free list - these are permanent)
static PyLongObject small_ints[_PyLong_NSMALLNEGINTS + _PyLong_NSMALLPOSINTS];

Float Free List

// Objects/floatobject.c
#define PyFloat_MAXFREELIST 256
static int numfree = 0;
static PyFloatObject *free_list = NULL;
// When a float is freed: prepend to free_list (if list < 256 items)
// When a float is allocated: pop from free_list if available, else pymalloc

import sys

# Observe float free list via gc
import gc
gc.collect()

# Create and delete floats to populate the free list
floats = [float(i) for i in range(100)]
del floats
gc.collect()

# Now allocating 100 floats should be very fast (free list hits)
import timeit
t1 = timeit.timeit('x = 3.14', number=10_000_000)
print(f"Float allocation: {t1:.3f}s")

List Free List

// Objects/listobject.c
#define PyList_MAXFREELIST 80
static PyListObject *free_list[PyList_MAXFREELIST];
static int numfree = 0;

Empty lists are cached in a free list of up to 80 entries. [].clear() does not free the list object - it goes back into the free list. The next [] allocation pops from the free list.

import sys

# Demonstrate the list free list
lst = []
lst_id = id(lst)
del lst

new_lst = []
new_lst_id = id(new_lst)

# On CPython, these will often be the same address (free list reuse)
print(f"Same address: {lst_id == new_lst_id}")  # Often True

# The free list means list creation is very cheap
import timeit
t = timeit.timeit('x = []', number=10_000_000)
print(f"Empty list creation: {t:.3f}s")  # Typically < 0.3s for 10M creates

Reference Counting: The Primary Memory Manager

Reference counting (covered in Lesson 02) is CPython's main mechanism for freeing memory. When ob_refcnt hits zero, the object is freed immediately - not deferred to a GC cycle:

import sys

class TrackDealloc:
    def __init__(self, name):
        self.name = name
        print(f"  Created: {name}")
    def __del__(self):
        print(f"  Freed:   {self.name}")

print("--- Block 1: immediate deallocation ---")
x = TrackDealloc("obj1")
print(f"  refcount: {sys.getrefcount(x) - 1}")  # -1 because getrefcount adds 1
del x  # ob_refcnt hits 0 → tp_dealloc → __del__ called immediately
print("  del completed")
# Output order:
# Created: obj1
# refcount: 1
# Freed: obj1    ← happens immediately on del
# del completed

print("\n--- Block 2: deallocation deferred by alias ---")
y = TrackDealloc("obj2")
alias = y  # refcount now 2
del y      # refcount drops to 1 - NOT freed yet
print("  y deleted, alias still holds reference")
del alias  # refcount drops to 0 → freed NOW
print("  alias deleted")
# Output:
# Created: obj2
# y deleted, alias still holds reference
# Freed: obj2
# alias deleted

The `tp_dealloc` Chain

When ob_refcnt hits zero, _Py_Dealloc(obj) is called, which calls obj->ob_type->tp_dealloc(obj). Each type's dealloc function:

Calls Py_DECREF on all referenced objects (which may trigger further deallocations)
Frees any internal buffers (e.g., PyListObject.ob_item array)
Calls PyObject_Free(obj) or PyObject_GC_Del(obj) to return memory to pymalloc

This is immediate and recursive. Deleting a large nested data structure (del big_dict) can trigger a deep chain of deallocations that takes significant time:

import time

# Build a deeply nested list
depth = 1000
lst = []
current = lst
for i in range(depth):
    new = []
    current.append(new)
    current = new

# Deletion triggers a cascade of 1000 deallocations
start = time.perf_counter()
del lst
elapsed = time.perf_counter() - start
print(f"Deleting {depth}-deep nested list: {elapsed*1000:.3f}ms")
# The C stack must handle ~1000 levels of recursive deallocation
# Python has a recursion limit here too (sys.getrecursionlimit())

The Cyclic Garbage Collector

Reference counting cannot handle cycles:

import gc

gc.disable()  # Pause automatic GC so we can observe

# Create a cycle
a = {}
b = {}
a['ref'] = b
b['ref'] = a

# a's refcount: 1 (from 'a' variable) + 1 (from b['ref']) = 2
# b's refcount: 1 (from 'b' variable) + 1 (from a['ref']) = 2

del a  # a's refcount: 2 → 1 (still alive, b['ref'] holds it)
del b  # b's refcount: 2 → 1 (still alive, a['ref'] holds it)

# Both objects have refcount 1 but are unreachable from any live name
# Reference counting alone cannot free them

# The cyclic GC detects and frees them:
before = gc.get_count()
print(f"Before collection: {before}")
collected = gc.collect(2)  # Full collection
print(f"Collected: {collected} objects")

gc.enable()

The Three Generations

CPython's cyclic GC uses a generational approach based on the observation that most objects die young (the "generational hypothesis"):

Generation 0 (youngest, collected most often):
  - New objects start here
  - Collected when len(gen0) > threshold[0] (default: 700)
  - Collection frequency: every ~700 object allocations

Generation 1 (collected less often):
  - Objects that survived a gen0 collection move here
  - Collected when gen1 collections > threshold[1] (default: 10)
  - So: every 10 gen0 collections → 1 gen1 collection

Generation 2 (oldest, collected rarely):
  - Objects surviving gen1 collection move here
  - Collected when gen2 collections > threshold[2] (default: 10)
  - So: every 10 gen1 collections → 1 gen2 collection
  - = Every 700 × 10 × 10 = 70,000 allocations → 1 gen2 collection

import gc

# Check thresholds
print(gc.get_threshold())   # (700, 10, 10)

# Check current counts
print(gc.get_count())       # (gen0_count, gen1_count, gen2_count)

# Manual collection
gc.collect(0)   # Collect generation 0 only
gc.collect(1)   # Collect generations 0 and 1
gc.collect(2)   # Full collection: all generations
gc.collect()    # Same as gc.collect(2)

# Adjust thresholds (rarely needed, but useful for tuning)
gc.set_threshold(1000, 15, 15)  # Less frequent collection

# Freeze objects to prevent them from being GC'd (3.7+)
gc.freeze()     # Move all objects to gen2 and freeze - not collected by default
# Useful before forking: prevents COW pages being dirtied by GC

How the Cyclic GC Finds Cycles

The cyclic GC uses a modified mark-and-sweep algorithm. The key mechanism is tp_traverse - every type that can contain references to other Python objects must implement this function:

// From Objects/listobject.c
static int
list_traverse(PyListObject *o, visitproc visit, void *arg)
{
    Py_ssize_t i;
    for (i = Py_SIZE(o); --i >= 0; ) {
        Py_VISIT(o->ob_item[i]);  // Call visit() on each element
    }
    return 0;
}

// From Objects/dictobject.c
static int
dict_traverse(PyDictObject *op, visitproc visit, void *arg)
{
    // Visit each key and value in the dict
    for each (key, value) in dict:
        Py_VISIT(key);
        Py_VISIT(value);
    return 0;
}

The GC algorithm (simplified):

1. BUILD: Collect all container objects (those implementing tp_traverse)
   that are in the generation being collected

2. IDENTIFY UNREACHABLE:
   For each object O in the collected set:
     gc_ref = ob_refcnt  // Copy the reference count

   For each object O:
     tp_traverse(O, decrement_gc_ref, NULL)
     // For each object O references, decrement its gc_ref

   // After this: objects with gc_ref > 0 are referenced from OUTSIDE the set
   // Objects with gc_ref == 0 are only referenced from within the set = cycles

3. MARK REACHABLE: Starting from objects with gc_ref > 0,
   traverse and mark all transitively reachable objects as "reachable"

4. SWEEP: Objects remaining unmarked (gc_ref == 0, not reachable) are unreachable
   Call tp_clear() on each: sets all outgoing references to NULL, Py_DECREF each
   This typically brings ob_refcnt to 0, triggering tp_dealloc

import gc
import weakref

# Objects only trackable by GC (have tp_traverse) are "container" objects
# Simple types like int, float, str do NOT participate in GC tracking
# (they cannot contain references to other objects)

class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

# Verify: Node instances are GC-tracked
n = Node(1)
print(gc.is_tracked(n))     # True - Node has __dict__ which can hold refs
print(gc.is_tracked(42))    # False - int cannot contain references
print(gc.is_tracked("hi"))  # False - str cannot contain references
print(gc.is_tracked([]))    # True - list can contain references

`tracemalloc`: Finding Memory Allocations

tracemalloc is CPython's built-in allocation tracer. It hooks into the allocator at Layer 2 (pymalloc) and records the call stack for every allocation:

import tracemalloc
import linecache

def display_top(snapshot, key_type='lineno', limit=10):
    """Display top memory-consuming lines."""
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print(f"\nTop {limit} lines by memory:")
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        print(f"#{index}: {frame.filename}:{frame.lineno}: "
              f"{stat.size / 1024:.1f} KiB")
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print(f"    {line}")

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print(f"{len(other)} other: {size / 1024:.1f} KiB")

    total = sum(stat.size for stat in top_stats)
    print(f"Total allocated: {total / 1024 / 1024:.1f} MiB")

# Start tracing with a 10-frame traceback depth
tracemalloc.start(10)

# Take a snapshot BEFORE the allocation
snapshot1 = tracemalloc.take_snapshot()

# The code you want to profile:
data = {}
for i in range(100_000):
    data[str(i)] = [i, i * 2, i * 3]

# Take a snapshot AFTER
snapshot2 = tracemalloc.take_snapshot()

# Show what was allocated between snapshots
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("\nMemory allocated between snapshots:")
for stat in top_stats[:5]:
    print(stat)

# Full analysis
display_top(snapshot2)

tracemalloc.stop()

For production use, tracemalloc can be left running continuously with minimal overhead (about 10-20% slowdown). Take snapshots periodically to track growth:

import tracemalloc
import time
import threading

class MemoryMonitor:
    """Monitor memory growth in a background thread."""

    def __init__(self, interval=60.0):
        self.interval = interval
        self._stop = threading.Event()
        self._thread = threading.Thread(target=self._run, daemon=True)

    def start(self):
        tracemalloc.start(5)
        self._baseline = tracemalloc.take_snapshot()
        self._thread.start()

    def stop(self):
        self._stop.set()
        self._thread.join()
        tracemalloc.stop()

    def _run(self):
        while not self._stop.wait(self.interval):
            snapshot = tracemalloc.take_snapshot()
            stats = snapshot.compare_to(self._baseline, 'lineno')
            total_growth = sum(s.size_diff for s in stats if s.size_diff > 0)
            print(f"[MemoryMonitor] Growth since start: {total_growth / 1024 / 1024:.1f} MiB")
            for stat in stats[:3]:
                if stat.size_diff > 0:
                    print(f"  {stat}")

# Usage:
# monitor = MemoryMonitor(interval=30)
# monitor.start()
# ... your application runs ...
# monitor.stop()

`sys.getsizeof` vs Actual Memory

sys.getsizeof() returns the size of the object's C struct itself - it does not recursively count objects the container references:

import sys

# sys.getsizeof lies about containers
lst = [1, 2, 3, 4, 5]
print(sys.getsizeof(lst))          # 104 bytes - just the PyListObject + ob_item array
# Does NOT include the 5 integers!

# Each integer takes additional memory
print(sys.getsizeof(1))            # 28 bytes × 5 = 140 bytes not counted

# Correct deep size measurement:
def deep_sizeof(obj, seen=None):
    """Recursively compute memory of an object and all its references."""
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    seen.add(obj_id)
    size = sys.getsizeof(obj)
    if isinstance(obj, dict):
        size += sum(deep_sizeof(k, seen) + deep_sizeof(v, seen)
                   for k, v in obj.items())
    elif isinstance(obj, (list, tuple, set, frozenset)):
        size += sum(deep_sizeof(item, seen) for item in obj)
    return size

lst = [1, 2, 3, 4, 5]
print(f"getsizeof:   {sys.getsizeof(lst)}")     # 104
print(f"deep_sizeof: {deep_sizeof(lst)}")        # 104 + 5*28 = 244

nested = [[i for i in range(100)] for _ in range(100)]
print(f"getsizeof:   {sys.getsizeof(nested)}")   # 856
print(f"deep_sizeof: {deep_sizeof(nested)}")      # ~250,000+

Memory Fragmentation in Long-Running Processes

Long-running Python services often show RSS growth that does not correspond to Python-visible memory. This is memory fragmentation.

What happens: pymalloc allocates arenas (256KB) from the OS. When objects within an arena are freed, the arena's pool space is reused by Python - but the arena is only returned to the OS when it is completely empty. If even one live object remains in an arena, the entire 256KB stays mapped.

In a service that processes many requests, arenas become partially filled with long-lived objects (connection pools, caches, module globals), preventing them from being returned to the OS. RSS grows even though Python's internal accounting shows reasonable memory usage.

import tracemalloc
import gc

# Diagnose: compare tracemalloc's view with /proc/self/status
def get_rss_mb():
    """Get process RSS in MB (Linux only)."""
    try:
        with open('/proc/self/status') as f:
            for line in f:
                if line.startswith('VmRSS:'):
                    return int(line.split()[1]) / 1024
    except FileNotFoundError:
        import resource
        # macOS fallback
        return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / (1024 * 1024)

tracemalloc.start()

# Simulate allocation pattern: allocate, partially free
big_list = [object() for _ in range(100_000)]
gc.collect()

rss_full = get_rss_mb()
py_mem = tracemalloc.get_traced_memory()[0] / 1024 / 1024

# Free most objects but keep some
survivors = big_list[::100]  # Keep every 100th object = 1000 survivors
del big_list
gc.collect()

rss_after = get_rss_mb()
print(f"RSS with 100K objects:   {rss_full:.1f} MB")
print(f"RSS after freeing 99%:   {rss_after:.1f} MB")
print(f"Python-tracked memory:   {py_mem:.1f} MB")
print(f"Fragmentation overhead:  ~{rss_after - py_mem:.1f} MB")
# The RSS stays elevated because arenas containing survivors cannot be freed

tracemalloc.stop()

Mitigation strategies:

Use gc.freeze() before forking - prevents copy-on-write page dirtying
Use __slots__ for frequently-created objects to reduce per-object allocator overhead
Use mmap-based storage (NumPy arrays, shared memory) for large numeric data
For very large caches, use Redis or another out-of-process cache to keep them outside Python's heap

Interview Q&A

Q1: How does CPython's memory allocator work? Describe the layers.

CPython uses a four-layer allocation system. At the bottom, the OS provides pages (4KB on Linux/macOS) via mmap or brk. The C library (malloc) sits above this and provides general-purpose heap allocation with its own free list management.

Above malloc sits pymalloc - CPython's custom small-object allocator in Objects/obmalloc.c. pymalloc handles all allocations of 512 bytes or fewer. It requests large chunks (256KB "arenas") from malloc, then subdivides them into 4KB "pools". Each pool is dedicated to a single "size class" (8, 16, 24, ... 512 bytes in 8-byte steps). Within a pool, blocks are allocated from a free list. This design means small object allocation requires no syscall and no malloc call - just a pointer pop from a pool's free list.

Above pymalloc, type-specific allocators call into it for their allocations. PyObject_New(PyLongObject) calls pymalloc for the 28-byte allocation. Objects larger than 512 bytes (large strings, large bytearrays, numpy arrays' internal data) call malloc directly.

Q2: What is a reference cycle, and how does CPython detect and collect it?

A reference cycle occurs when a group of objects reference each other in a loop such that no external name points to any of them. The simplest case: a = {}; b = {}; a['ref'] = b; b['ref'] = a; del a, del b. After deleting both names, a and b have reference counts of 1 (each holds a reference to the other), but neither is reachable from any live Python name. Reference counting alone cannot detect this - the counts never hit zero.

CPython's cyclic GC handles this. It maintains "tracked" sets for container objects (those that can hold references to other objects) across three generations. Periodically, it runs the collection algorithm: for every tracked object, it copies ob_refcnt to a scratch field gc_ref. Then it traverses all references (using tp_traverse) and decrements gc_ref for each referenced object. Objects with gc_ref > 0 after this step are referenced from outside the tracked set and are alive. Starting from these, the GC marks transitively reachable objects as alive. Any remaining objects with gc_ref == 0 are unreachable cycles and are freed by calling tp_clear on each, which severs their internal references and causes ob_refcnt to hit zero, triggering normal deallocation.

Q3: When is an object freed in CPython? Is it always freed immediately?

In most cases, yes - CPython frees objects immediately when their reference count hits zero. del x calls Py_DECREF on x's object; if ob_refcnt drops to zero, _Py_Dealloc is called synchronously, which calls tp_dealloc, which frees the object's internal buffers and returns memory to pymalloc (or free()). This happens in the same Python opcode execution, not asynchronously.

Two exceptions: (1) Cyclic garbage - objects in reference cycles never have their count hit zero through reference counting alone. They are only freed when the cyclic GC collects them, which happens at the configured threshold (every ~700 allocations by default). (2) Free lists - some types (float, empty list, empty dict, tuple) have free lists. When their refcount hits zero, instead of truly freeing the memory, the object is pushed onto a type-specific free list (up to a max size). The memory is reused for the next allocation of the same type. This means del my_float does not immediately return 24 bytes to pymalloc.

Q4: What does sys.getsizeof() actually measure? When does it mislead you?

sys.getsizeof() returns the size in bytes of the Python object struct itself - the result of calling __sizeof__() plus the GC overhead header (if applicable). For a list, it returns the PyListObject struct size plus the ob_item pointer array size, but does NOT include the memory used by the list's elements. For a dict, it returns the dict struct plus the hash table entry array, but not key or value objects.

It misleads in several ways: (1) sys.getsizeof([1, 2, 3]) is 88 bytes but the three integers each consume 28 bytes more, plus the ob_item is counted but the elements themselves are not. (2) It does not count strings referenced by co_consts or other internally held objects. (3) It does not include OS-level allocator overhead (alignment, metadata) or arena fragmentation.

For accurate total memory, use tracemalloc to take before/after snapshots around the code that creates your data structure, or write a recursive deep_sizeof function that traverses all referenced objects. For process-level memory, read /proc/self/status (RSS) on Linux or use resource.getrusage on macOS.

Q5: What is gc.freeze() and when would you use it?

gc.freeze() (added in Python 3.7) moves all currently tracked objects from generations 0, 1, and 2 into a "frozen" generation that is never collected. Frozen objects are still alive and usable, but they are permanently excluded from future GC cycles.

The primary use case is forking. When a Python process forks (e.g., a multiprocessing worker or a Gunicorn pre-fork web server), the child process shares the parent's memory pages via copy-on-write. If the GC runs in the child process immediately after forking, it traverses all tracked objects, reading and potentially modifying GC header fields (the gc_ref scratch count), which dirties those memory pages and triggers copy-on-write for the entire interpreter state. This can cause significant memory overhead for each worker.

By calling gc.freeze() before forking, you prevent post-fork GC cycles from touching the pre-fork objects. New objects created after the fork accumulate in generations 0-2 normally, but the pre-fork object graph is never traversed. This is the recommended pattern for Gunicorn, uWSGI, and similar pre-fork servers running Python applications.

Why Does a Python Process Use 800MB for 10 Million Small Integers?​

Memory Allocator Layers​

pymalloc: Arenas, Pools, and Blocks​

Free Lists: Recycling Common Objects​

Integer Free List​

Float Free List​

List Free List​

Reference Counting: The Primary Memory Manager​

The tp_dealloc Chain​

The Cyclic Garbage Collector​

The Three Generations​

How the Cyclic GC Finds Cycles​

tracemalloc: Finding Memory Allocations​

sys.getsizeof vs Actual Memory​

Memory Fragmentation in Long-Running Processes​

Interview Q&A​